Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

December 1st, 2020

Virtual trip 2 Sweden 4 NLP4CALL

Well, the Covid-19 situation continues, and conference organizers are gradually adapting and finding ways to hold productive events online. I participated in another even this past week which was quite well-run and an interesting conference, to boot. Though unfortunately, I didn't give any presentation of my own.

The event was the Swedish Language Technology Conference (SLTC) with a workshop on the side called Natural Language Processing for Computer-assisted Language Learning (NLP4CALL). While the SLTC event was fairly interesting, my main focus was NLP4CALL. All the presentations were very interesting, but not directly about filled pauses or hesitation phenomena. Yet, they inspired some ideas of my own, which is what a good conference should do.

So, one presentation that I found inspiring was by Andrew Caines and colleagues on "The Teacher-Student Chatroom Corpus" (info). They are building a corpus of communications between teachers and students via text-based chat. At the moment, their analysis is largely based on categorizing the contributions of each participant in terms of conversation analysis sequences (e.g., opening, topic, disruption, closing). What was generally fascinating is how different levels of students show predilection toward different types of sequences: with lower level learners, teachers are much more often eliciting and higher level learners are more often engaged in enquiry, while both teachers and learners at all levels are engaged in repair sequences about the same.

This presentation got me thinking about how or even whether teachers and learners would use filled pauses in chat. Of course, in pure conversational chat between friends, filled pauses are becoming common (or at least, not uncommon). But would that trend be carried over into more formal communication like this? And if so, what would it mean? Would a teacher use it to somehow draw attention to a particular pedagogical point? Would a learner use it in order to indicate to the teacher that they're having some difficulty? Would either of them use them in the friendly manner that might be used in casual chat? These would be great questions to explore further with a corpus like this.

The keynote address was given by Magali Paquot who talked about "Crowdsourcing as a means to democratize access to L2 enriched data: the case of L2 proficiency". This was an excellent presentation that was both well-given and thought-provoking. She talked, pretty much as the title suggests, about the reliability of using crowd-sourcing for language proficiency assessment and comes down squarely on the side of promoting crowd-sourcing. I don't disagree, though I wonder if it depends very much on what we're assessing. If the aim is to get an accurate estimate of a learner's general proficiency, then I think crowd-sourcing is likely a reliable way of doing this. But, if we're interested in assessing some component of this (say, one of the big "three": complexity, accuracy, fluency; or a component of communicative competence: e.g., linguistic, strategic, sociolinguistic, or discourse), then I wonder how reliable "the crowd" will be. I mean, I think they can be reliable, in the sense of giving relatively consistent results. But will they be valid? Will they actually reflect what it is we're trying to measure? Ultimately, I suppose that would depend on what instructions we give to the crowd raters, but aren't we then just effectively turning them into pseudo-experts?

Photo by PIXNIO: https://pixnio.com/people/male-men/man-speaking-metal-can

One more presentation that generated some interest was actually in the SLTC schedule by Christina Tånnander and Jens Edlund titled "Self-perceived preferences of voice and speaking style characteristics in spoken text" (paper). They were interested in finding out what kind synthetic voices people actually preferred and carried out a survey to investigate this. Some results are a bit obvious: people prefer a voice that is soft, neither bright nor dark (not quite sure what that means), flowing, and not shrill, nasal, or forced. But this got me to wondering about preferences for authentic voices. I think surveys like this have been done before, but I wonder if they have ever been done with the depth shown here. They looked at a very wide range of features. I would like to see this replicated for the consideration of authentic voices. I actually suspect that there won't be much difference from the preferences for the synthetic voices. But if there are, that would indeed be all the more interesting.